[ML] Resolve duplicate key exception in GetDatafeedRunningStateAction #125477

hye-on · 2025-03-24T08:49:00Z

This PR fixes issue #104160 where a duplicate key exception occurs in GetDatafeedRunningStateAction.Response.fromResponses(). The issue happens when a datafeed is force-stopped and restarted before its local task cancellation completes. This creates a situation where two local tasks for the same datafeed temporarily coexist on the ML node (one cancelling, one starting), causing the duplicate key error when both report their state.
The solution implements a merge function in the toMap collector that selects the most appropriate state when duplicates are found, based on the searchInterval data.

The solution implements a merge function in the toMap collector that selects the most appropriate state when duplicates are found, based on the searchInterval data.
Select the most appropriate state based on:

Prefer state with more recent searchInterval.startMs when both exist
Prefer states with searchInterval over those without
Default to second state when all criteria are equal

Comment

I'm new to Elasticsearch and open source contributions in general. I went with searchInterval.startMs as the selection criteria, but I'd appreciate any feedback on whether there might be a better approach for handling these duplicate states. Thank you for your guidance! :)

Fixes #104160

Implement merge function for duplicate datafeed states when a datafeed is force-stopped and restarted before cancellation completes. Select the most appropriate state based on: 1. Prefer state with more recent searchInterval.startMs when both exist 2. Prefer states with searchInterval over those without 3. Default to second state when all criteria are equal

elasticsearchmachine · 2025-03-26T09:15:34Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elasticsearchmachine · 2025-03-31T07:34:03Z

Pinging @elastic/ml-core (Team:ML)

davidkyle · 2025-03-31T09:50:25Z

Thanks for the contribution @hye-on and welcome to Elasticsearch

Prefer state with more recent searchInterval.startMs when both exist
Prefer states with searchInterval over those without
Default to second state when all criteria are equal

The logic you've applied makes perfect sense to me, this is a practical solution given that searchInterval may be null in some situations.

Please can you add a unit test to cover the logic in the Response::selectMostRecentState function. Create a. new test in GetDatafeedRunningStateActionResponseTests

davidkyle · 2025-03-31T09:50:46Z

@elasticmachine test this please

hye-on · 2025-03-31T14:11:30Z

@davidkyle I’ll add the test! I’ll reach out if I need any help. Thank you for the review! :)

…ateAction

hye-on · 2025-03-31T15:19:50Z

@davidkyle I’ve added the tests! Thank you!

davidkyle · 2025-04-01T08:17:34Z

@elasticmachine test this please

davidkyle

LGTM

Thanks for the contribution

elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Mar 24, 2025

hye-on changed the title ~~Resolve duplicate key exception in GetDatafeedRunningStateAction~~ [ML] Resolve duplicate key exception in GetDatafeedRunningStateAction Mar 26, 2025

AI-IshanBhatt added the :Search Relevance/Search Catch all for Search Relevance label Mar 26, 2025

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Mar 26, 2025

elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Mar 26, 2025

valeriy42 added :ml Machine learning Team:ML Meta label for the ML team >bug labels Mar 31, 2025

Merge branch 'main' into fix/datafeed-duplicate-key

7b58bac

davidkyle removed Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch :Search Relevance/Search Catch all for Search Relevance labels Mar 31, 2025

davidkyle self-assigned this Mar 31, 2025

[CI] Auto commit changes from spotless

01c3f9c

Add unit tests for Response state merge logic in GetDatafeedRunningSt…

e4ffd8e

…ateAction

davidkyle approved these changes Apr 1, 2025

View reviewed changes

davidkyle merged commit 89adec1 into elastic:main Apr 1, 2025
17 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Resolve duplicate key exception in GetDatafeedRunningStateAction #125477

[ML] Resolve duplicate key exception in GetDatafeedRunningStateAction #125477

Uh oh!

hye-on commented Mar 24, 2025 •

edited by davidkyle

Loading

Uh oh!

elasticsearchmachine commented Mar 26, 2025

Uh oh!

elasticsearchmachine commented Mar 31, 2025

Uh oh!

davidkyle commented Mar 31, 2025

Uh oh!

davidkyle commented Mar 31, 2025

Uh oh!

hye-on commented Mar 31, 2025

Uh oh!

hye-on commented Mar 31, 2025

Uh oh!

davidkyle commented Apr 1, 2025

Uh oh!

davidkyle left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[ML] Resolve duplicate key exception in GetDatafeedRunningStateAction #125477

[ML] Resolve duplicate key exception in GetDatafeedRunningStateAction #125477

Uh oh!

Conversation

hye-on commented Mar 24, 2025 • edited by davidkyle Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment

Uh oh!

elasticsearchmachine commented Mar 26, 2025

Uh oh!

elasticsearchmachine commented Mar 31, 2025

Uh oh!

davidkyle commented Mar 31, 2025

Uh oh!

davidkyle commented Mar 31, 2025

Uh oh!

hye-on commented Mar 31, 2025

Uh oh!

hye-on commented Mar 31, 2025

Uh oh!

davidkyle commented Apr 1, 2025

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hye-on commented Mar 24, 2025 •

edited by davidkyle

Loading